Multiple objective scheduling of HPC workloads through dynamic prioritization
نویسندگان
چکیده
We have developed an efficient single queue scheduling system that utilizes a greedy knapsack algorithm with dynamic job priorities. Our scheduler satisfies high level objectives while maintaining high utilization of the HPC system or collection of distributed resources such as a computational GRID. We provide simulation analysis of our approach in contrast with various scheduling strategies of shortest job first; longest waiting jobs first; and large jobs first. Further, we look at the effects of system size on the total workload response time and find that for real workloads, the relationship between response time and system size follows an inverse power law. Our approach does not require system administrators or users to identify a specific priority queue for each of their jobs. The proposed scheduler performs an exhaustive parameter search for a priority calculation per job to balance high level objectives and provide guaranteed performance jobs in a workload. The system administrator needs only tune the prioritization parameters (knobs) and the system scheduler will behave accordingly, such as reducing wait time for jobs that are above average size with small runtimes. We demonstrate that our approach works very well on workloads that have many independent tasks. We evaluate our scheduler on a realistic mixed scientific data processing workload and with a realistic HPC workload trace from the parallel workloads archive.
منابع مشابه
Towards understanding HPC users and systems: A NERSC case study
The high performance computing (HPC) scheduling landscape is changing. Previously dominated by tightly coupled MPI jobs, HPC workloads are increasingly including high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job level, posing new challenges to classical HPC schedulers. There is a need to underst...
متن کاملAnalysis and Modeling of Social Influence in High Performance Computing Workloads
Analysis and Modeling of Social Influence in High Performance Computing Workloads Shuai Zheng High Performance Computing (HPC) is becoming a common tool in many research areas. Social influence (e.g., project collaboration) among increasing users of HPC systems creates bursty behavior in underlying workloads. This bursty behavior is increasingly common with the advent of grid computing and clou...
متن کاملScalable Resource Management in Cloud Computing
The exponential growth of data and application complexity has brought new challenges in the distributed computing field. Scientific applications are growing more diverse with various workloads, including traditional MPI high performance computing (HPC) to fine-grained loosely coupled many-task computing (MTC). Traditionally, these workloads have been shown to run well on supercomputers and high...
متن کاملFully Predictable HPC Infrastructure using Admission Control with Virtualization
Historically, batch scheduling has dominated in managing HPC workloads despite its unpredictability regarding job’s wait time. Although existing researches such as reservation partially solved the problem, fully predictable HPC system still remains elusive goal while emerging adaptive applications urge its realization. Our earlier study presented a control-theoretic, VM-based approach that achi...
متن کاملThe Effect of an Application Performance Modelling Tool
One of the most important metrics of machine efficiency in HPC is job turnaround time, which is the time taken for a user to submit a job and recieve their results. This time consists of two primary components; run-time, which depends on the resources allocated to the job, and queue wait time, which is dependent on the resources requested and the present level of machine usage. This paper inves...
متن کامل